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Abstract 

In astrophysics and meteorology there exist numerous situations where flows 
exhibit small velocities compared to the sound speed. To overcome the stringent 
timestep restrictions posed by the predominantly used explicit methods for integra- 
tion in time the Euler (or Navier-Stokes) equations are usually replaced by modi- 
fied versions. In astrophysics this is nearly exclusively the anelastic approximation. 
Kwatra et al. [19] have proposed a method with favourable time-step properties 
integrating the original equations (and thus allowing, for example, also the treat- 
ment of shocks). We describe the extension of the method to the Navier-Stokes 
and two-component equations. - However, when applying the extended method to 
problems in convection and double diffusive convection (semiconvection) we ran 
into numerical difficulties. We describe our procedure for stabilizing the method. 
We also investigate the behaviour of Kwatra et al.'s method for very low Mach 
numbers (down to Ma = 0.001) and point out its very favourable properties in this 
realm for situations where the explicit counterpart of this method returns abso- 
lutely unusable results. Furthermore, we show that the method strongly scales over 
3 orders of magnitude of processor cores and is limited only by the specific network 
structure of the high performance computer we use. 
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1 Introduction 



In astrophysics, meteorology and many other fields of physical and engineer- 
ing sciences problems are studied which are characterized by vastly different 
timescales. In fluid flow the smallest time scales are often set by diffusive pro- 
cesses, whereas the solution changes only on the much larger time scales set 
by the flow velocity or the sound speed. It is then general practice to make use 
of explicit time integration for the hyperbolic terms and to treat the diffusive 
terms implicitly. The latter task is facilitated by the fact that the resulting 
system of discretized equations is often linear and positive definite so that 
effective solution methods are available. 

Frequently, however, the different scales originate from the macroscopic flow 
speed (small) and the sound speed (large). In addition, the sound waves may be 
known to be of little physical significance. Such a situation is ubiquitous in me- 
teorology and geophysics (convection in the earth mantle). In stellar physics, 
these premises typically hold true for convection in the interior of stars, for 
example in the solar convection zone except for its outermost parts. Since an 
implicit time-advancement of the hyperbolic part is usually impractical due 
to, among others, a lack of efficient solvers for these terms, the usual solution 
strategy here is to eliminate the sound waves analytically and to discretize the 
modified equations only. A widely (in stellar physics: nearly exclusively) used 
way to accomplish that is to resort to the anelastic approximation [27] and 
its many variants or extensions. In meteorology, the use of so-called primitive 
equations is also widespread (cf.,pQ,e.g.). 

In [19] N. Kwatra et al. propose a semi-implicit method to solve Euler's equa- 
tions numerically. From a physical point of view the advantage of this approach 
is that the method avoids any simplifications on the side of the basic equations 
themselves. As a consequence of that it is possible to treat physical phenom- 
ena for which the anelastic methods are neither designed nor appropriate, 
e.g. shocks. Indeed, in [19J a number of shock tube problems are satisfacto- 
rily treated as test cases. Furthermore, it is outlined that this method has 
favourable properties in the low Mach number regime. On the numerical side, 
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the implicit part consists of the solution of an equation of a generalized Pois- 



son equation only (equation (32) in the present paper). For such equations, 
efficient solvers are quite readily available. In [TH], the method was developed 
in the framework of ENO type numerics. 

These properties made the method an interesting candidate for inclusion in 
our ANTARES code [26] which is designed to cope with flows encountered in 
stellar physics ranging from low to high Mach numbers, including shocks. The 
original development in terms of ENO methodology complied well with the 
fact that ANTARES makes use of this methodology also in the other methods 
included for spatial discretization. 

While after a few adaptions the Kwatra method performed well in the usual 
shock tube tests we performed, difficulties arose when applying its straight for- 
ward extension to the problem of semiconvection in stellar physics (double- 
diffusive convection; diffusive convection in oceanography) in the same way 
we had, earlier on, successfully tackled similar problems with standard ENO 
methods. In semiconvection possibly slow (in terms of sound speed) convective 
motions take place. Already they enforce the calculations to span a large time 
interval in order to arrive at a relaxed, statistically steady state. In addition, 
between zones of ordinary convection thin, more or less horizontal sheets are 
situated where the transport of helium in the star (salt in the ocean) is pro- 
vided by diffusive processes only, which may easily make for even longer time 
scales • Experience thus showed that the successful mastering of the shock 
tube problems as usually applied for validating numerical discretizations does 
not guarantee stability for long periods of time. The extended version of Kwa- 
tra's method, which basically consisted of adding the viscous terms to the 
explicit part of the method did not yield stable results. Indeed, severe insta- 
bilities appeared in the course of time which rendered the calculations useless. 
This was quite surprising since the explicit ENO-based method employing the 
same discretization led to satisfyingly stable results. It was therefore warranted 
to stabilize the original method so as to achieve stability even for long-term 
integrations. 

The plan of the paper is the following. We first give a short description of the 
basics of Kwatra et al.'s method. Subsequently, we derive a pressure evolution 
equation which, unlike the original equation, takes also dissipative terms into 
account (radiative transfer in the diffusion approximation as adequate for 
stellar interiors). We also consider the case of two-component flows (semi- 
or thermohaline convection). We then describe the difficulties which resulted 
when applying this method to the semiconvection problem. Next we turn to 
the enhancement of stability which solved the numerical problems and discuss 
a few simulations. 

Subsequently, we show that this method scales strongly over 3 orders of mag- 



3 



nitude of processor cores and is limited only by the specific network structure 
of the high performance method we use. This makes it suitable for high per- 
formance computing despite the necessity of solving an elliptic equation each 
timestep. 

Furthermore, we validate the method of Kwatra et al., by testing it in the 
high Mach number regime as well as its performance in low Mach number 
flows (down to Ma=0.001). In the latter regime, Kwatra's method shows very 
satisfying behaviour which contrasts grossly with the completely useless results 
of the standard explicit method. 

We conclude that, given the beneficial properties of this method, it poses a 
real alternative to the anelastic or Boussinesq approximations. 



2 Numerical method 



We model fluid flow by the set of compressible Navier-Stokes equations (NSE). 
A detailed derivation is found in pU] and [20]. For better legibility, all deriva- 
tions are presented in one spatial dimension. Hence, d/dx is, in this case, the 
one-dimensional divergence operator V. The generalization to higher dimen- 
sions is easily done. 



The compressible Navier-Stokes equations for a chemically homogeneous flow 
read 
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The state variables in the Navier-Stokes equations depend in principle on the 
spatial variable x and time t. The explicit dependencies are stated in Table 
[TJ In general, the radiative source term Q rad = V • F rad is determined as 
the stationary limit of the radiative transfer equation (see [23J for discussion 
and further references). However, we will only consider settings taking place 
in stellar regions which are optically thick, so this quantity can accurately 
be obtained by means of the diffusion approximation for radiative transfer, 

Qrad = V-F rad = -V-(iTVT). 



We now introduce our extended version of the semi-implicit numerical method 
of N. Kwatra et al. [TH]- To this end, we split the flux term in an advective 
part, a non-advective part, and a viscous part, 
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gas density 
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u(x, t) 


flow velocity 
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momentum density 
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- r(x,t) 


gas pressure 
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c(x, i) 


concentration of second species 






gravitational acceleration 
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viscous stress tensor for zero bulk viscosity 


V 




dynamic viscosity, appears in the definition of t 
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temperature 
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= K{p, T, c) 


radiative conductivity 




= K/{cpp) = K(i>PJ 


radiative diffusivity 


e = 


e(x, i) = e int 0, t) + e kin (x, t) 


total energy density, sum of internal and kinetic energy densities 
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adiabatic index (7 = 5/3) for an ideal gas 


K c = 


= K c (p,T, c) 


diffusion coefficient for species c 



Table 1 



Variables and parameters appearing in the model equations 
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The advective flux is calculated using the weighted essentially non-oscillatory 
(WENO) method proposed in [12]. This approach takes the computational val- 

X. 1 

ues provided as cell averages — t / v (() d( and reconstructs the numerical 

x i-i 

1 2 

fluxes at the cell interfaces by the means of an upwind biased interpolation 
operator. To ensure that every conservative variable is upwinded according to 
its characteristic speed, which is given by the corresponding eigenvalue of the 
Jacobian -DF ad , the upwinding is done in the eigensystem where the equations 
decouple. Due to the flux splitting approach the Jacobian of the advective flux 
F ad reads 
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Since all eigenvalues of are equal to u, we make use of the characteristic 
projection method (CPM) [3] and apply componentwise upwinding to F^U). 
However, Kwatra et al. found in [19] that a slight modification of the advective 
flux function F at } leads to less spurious osciallatory behaviour of the ENO 
scheme. They propose to replace the advection velocities with the MAC grid 
value defined at the point in question, i.e. replace F ad = (pjUj, PjU 2 , ejUj) with 

-Fad = (pjU i+1 / 2 , p j UjU i+ i/ 2 ,e j u i+1 / 2 ) with u i+1/2 = to construct 

the flux at Xi+i/2- They call this approach modified ENO (MENO). 

The viscous part F vls {U) is calculated using a fourth order finite difference 
scheme presented in Section |3J 

Thus, we update the advective and the viscous part and add the vector of body 
forces to obtain an intermediate value of the conservative variables p*, (pu)* 
and e*. Since the pressure does not affect the density p, we have p* = p n+1 . 
Here and in the sequel superscripts like n denote the timestep. 



As outlined in |19| . the non-advective momentum and energy updates are 



( P u) n+1 - ( pu y 
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Dividing Eq. (J3| by p n+1 and taking its divergence we obtain 



n+1 



V • u* - AtV 
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\p n+1 J 



(5) 



Similar to [TH], we use the pressure evolution equation for the Navier-Stokes 



equations derived below in Section 2.1 
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We take V • u to be at time n + 1 through the timestep and substitute into 
([5| to obtain 



dP ( VP \ 2 

_ +M . V P = -^V-^+^AtV- f —J - - (t* • Vr - V • (u ■ t) + V ■ (KVT)) 

P P (7) 

This is an advection-diffusion equation with a source term. We discretize the 
advective term and the source term using an explicit Euler-forward timestep 
and define the pressure in the diffusion term V • (VP/ p n+1 ) implicitly at t n+l . 
Apart from u*, the velocity is taken at the old timestep, u n . This leads to a 
general elliptic equation 



gpn+l _ y . ( KV pn+l) = f (g) 

where 



« = At ( 10 ) 

p n+l \ I 

P n - Atu n - VP" 1 

{u n ■ V • r n - V • {u n ■ r n ) + V • {KVT n )) (11) 



3At (p n ) 2 (c™y 



As it is common in projection-like methods, we prescribe von Neumann bound- 
ary conditions, 



Q pn+l 

'an= 0. (12) 



dn 

n denotes the outward pointing unit normal. 



Taking over the formalism of the dual cell of [19] we interpolate the pressure 
to the cell interfaces via 



p i _ Pj+lPi + PjfH+l Qo\ 

i+ 5 ~ Pi + p i+ i 
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and update equations ^ and Q using a central difference quotient. 



However, the pressure P obtained form equation (|8j) is only an accurate pre- 
diction of the thermodynamical pressure at time n + 1. Therefore, after the 
completion of the momentum and energy update, the predicted pressure is 
discarded and the pressure at the new timestep is determined from the equa- 
tion of state. This ensures that the pressure is always perfectly consistent with 
density and temperature values. 



2.1 The Pressure Evolution Equation for a general EOS 



Basically, the derivation of the pressure evolution equation follows the lines 
of jl], but instead of the Euler equations, we use the NSE. Although we start 
out using a general equation of state (EOS), the final shape of the pressure 
evolution equation depends on the equation of state considered, which is in 
our case the equation of state for ideal gas. However, a corresponding pressure 
evolution equation may be derived for any EOS. 

We consider the general equation of state P = P(p, e) where e = e int /p. Taking 
the total derivative we have 



DP (dP\ Dp + (dP\ De 



Dt \dp Dt \de Dt 
As it is common in thermodynamical notation, the subscripts in equation 



(14) denote the thermodynamic state variable which is held constant at the 



evaluation of the differential expression. 

Analysis of the Navier-Stokes equations shows that 

^ = -/>V-« (15) 



De _ -u ■ (V • t) - P(V • u) + V • (u ■ r) - V • (KVT) 
~Dt~ p 



(16) 



Using this and the general definition of the sound speed, := (§^) 5 
(fj^J + ' wnere S is the specific entropy, we arrive at 
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dP ( dP\ 1 

_ +U -WP = -(y-u)pc 2 s -y— j -( M -Vr-V-(«-r) + V-(OT)) (17) 

The partial derivative (ff) is calculated using the equation of state. In case 
of ideal gas the EOS relating the pressure P to the internal energy e reads 



n 2 „ (dP\ 2 

P = -ep , so naturally, we have -—- = -p 
3 \ oe / 3 
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Thus we arrive at equation 



2.2 Extension to a two-component flow 



To model the behaviour of a two component fluid we add a concentration 
equation to the Navier-Stokes equations: 

g- t (P c ) + ^P CU ) = V • (P^cVc) (18) 



Splitting the fluxes as before leads to 
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(19) 



equal u, we apply the MENO algorithm to F^iU) and evaluate F vis2 as 
outlined for the one-component case. 
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Similar to the continuity equation, equation (18) is unaffected by the pressure, 
so (pc)* = (pc) n+1 . 



Basically, the momentum and energy updates @, Q are carried out as before, 
but since the pressure depends on the mass fraction c, the pressure evolution 
equation ^ takes a slightly different appearance. 



Since P = P(p, £,c), equation (14) reads 
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Enforcing mass conservation, analysis of equation (18) shows that 
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We use the equation of state for a perfect gas 



RgaspT 



(22) 



where p denotes the mean molecular weight of our compound. It is related to 

-. atmi and atni2 refer to the atomic 



the mass fraction c by u = —r- — -r, — r-, , 

J ™ c /atmi+(l— c) /atm2 

weight of the constituents of the fluid. In our case, we usually model a mixture 
of hydrogen and helium, c denotes the mass fraction of helium. Therefore, we 
have atni! = 4 being the atomic weight of helium whereas atm 2 = 1 is the 
atomic weight of hydrogen (in a non-ionized state). It is straightforward to 
include electron pressure in the EOS for a partially or a fully ionized gas. 
Thus, we arrive at 




RgaspT 



(23) 



Whence the pressure evolution equation ^ reads 
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We can thus use the discretization technique employed in equations ^ and 
(JTJ) to proceed from timestep n to n + 1 and rewrite equation (11) such that 
the right hand side becomes 
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(25) 



S.5 T/ie Algorithm 



In the end, the fractional step algorithm presented in the previous sections 
consists of the following steps: 

(i) We start at timestep n with values p n , (c) n (pu) n , (e) n ,P n . We calculate 
the velocity at the cell interfaces via 



{pU)i + (pti) iH 



(26) 



Pi + Pi+i 

and determine the modified flux function F ad j = (pjMj+1/2, PjCjU i+ i/ 2 , PjUjU i+ i/ 2 , ejU i+ i/ 2 ) T . 

(ii) We determine J|-F a d using the CPM procedure. -§^F vis is calculated using 
a finite-difference approach. 

(iii) We determine the density at the new timestep as well as intermediate 
values for momentum and total energy via 



/ n n + l \ ( p n \ 
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iv) Using equation (13), we interpolate P n to the cell interfaces and apply 



a central difference quotient to calculate VP n , 
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Similarly, V • u* is determined by interpolating (pu)*/p n+1 = u* to the 



cell interfaces as in equation (26) and calculate 



U i+l/2 U i-l/2 

Ax 



(29) 



(v) 



(vi) 



vn 



We now determine the coefficient functions c(x) and k(x) according to 
equations ^ and (10). The right hand side f(x) is calculated according 
to (11) or (25), respectively. 



We solve the generalized Poisson equation (32) prescribing von Neumann 
boundary conditions (12) and obtain the predicted pressure Pl^j. 



We determine VP^" d analogously to VP n in equation (28) in step (iv) 
and perform the momentum update as prescribed by equation 



(viii) Following the suggestion of N. Kwatra et al. in [19] . the cell face velocity 
u i+i/2 * s updated separately to w^+i/s v * a 
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(ix) We update the total energy e as prescribed in equation Q, 



e n+l _ e * 
At 



-V ■ (PpredW 



,n+l 
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(31) 



As usual, the gradients are determined via the central difference quotient 
using the interpolated values at the cell interfaces. 



(x) We determine the EOS pressure at time n + 1 and set P 



n+1 



pn+1 
^EOS- 



2.4 A Solver for the Generalized Poisson Equation 



The fractional step method presented above relies on the prediction of the 
pressure at the next time step. This prediction is obtained by evaluating a 
Poisson type equation, 



V • (k(x)V$(z)) + c(x)*(x) = f(x). 



(32) 



It is crucial for the performance of the overall method to have an efficient 
numerical solver for equations of this type at hand. To fit into the existing 
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ANTARES framework this solver is required to run in parallel and scale effi- 
ciently. 



From an analytical point of view existence and uniqueness of a solution to 



equation (32) is guaranteed, if the coefficient functions k(x) and c(x) are 
strictly positive and bounded over the entire domain. Due to ^ and (10) 
this can be ensured for any physically meaningful problem where p > 0. The 
conditions on n(x) and c(x) ensure the ellipticity and coercivity of the linear 
operator L = — V • (/c(sc)V) + c(x) and therefore, we may invoke the theorem 
of Lax-Milgram which guarantees the existence of a unique weak solution of 



(32) for any square-integrable f(x). A general theoretical treatment of second 



order elliptic partial differential equations is found in [2J, whereas in [8] the 



special case of (32) is analysed. 



Various techniques are available to discretize equation (32). The discretization 
process transforms the partial differential equation into a system of linear 
equations. The solution of the linear system is the numerical approximation 



to the analytical solution of (32). 



For discretization we preferred a finite element ansatz over a finite difference 
scheme, since the finite element technique leads to a symmetric and positive 
definite system matrix. A solution of the resulting linear system is determined 
by means of the preconditioned Conjugate Gradient algorithm. As a precondi- 
tioner we use the incomplete Cholesky decomposition with fill-in. A detailed 
description of the methods is found in [8]. 



2.4-1 Scalability of the Poisson Solver 

To test the scaling capability of our solver we solve the two-dimensional gen- 
eralized Poisson equation (32) with k(x), c(x), f(x) as specified in (J9|-(ll) on 



a rectangular domain discretized with 800 x 800 grid points and employ a 
varying number of computational cores. The results are given in Table [2} 

We point out that the run on a single core is algorithmically different, since 
in this case the generalized Poisson equation is solved using (only) the Con- 
jugate Gradient algorithm. The Schur complement algorithm is first required 
solving the generalized Poisson equation on two cores and induces consider- 
able overhead. The wallclock times indicate that this overhead slows down the 
calculation by a factor of 12, i.e. employing the Schur complement method 
and solving the generalized Poisson equation on 12 cores takes as long as 
solving it on a single core employing just the Conjugate Gradient algorithm. 
Therefore, in our test the first real speedup is achieved employing 16 cores. 
However, the results also show that the wallclock time decreases linearly be- 
yond that. Hence, except for a constant factor, the Schur complement method 
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scales optimally over three orders of magnitude in number of cores. 



# cores 


time in s 


# cores 


time in s 


1 


1.000 


64 


0.168 


2 


6.033 


128 


0.096 


4 


3.109 


256 


0.043 


8 


1.361 


512 


0.021 


16 


0.695 


1024 


0.016 


32 


0.345 







Table 2 

Poisson solver scaling test: calculated at the IBM Power6 575 System at RZG. Due 
to the considerable overhead induced by the Schur complement method the first 
real speedup compared to the (algorithmically different) single-core run is achieved 
employing 16 cores. 



2.4-2 Stability of the Poisson Solver 



In this test, we solve equation (32 ) with k(x, y) = 1, c(x, y) = 1 and f(x, y) = 
and apply Neumann boundary conditions in x-direction and periodic bound- 
ary conditions in y-direction. The analytical solution of 



-Acf> + 



dn 



\x=Q 



dn 



x=l 



is <f)(x,y) = e x , x,y e [0, 1]. 

To ascertain whether our solver is capable of producing smooth results even if 
slight oscillations are present in the coefficient functions we introduce random 
perturbations of the order 10 -2 in k, c, and /. 

The results of this test are depicted in Figure [T] and quantified in Table [3j It 
seems that the perturbation of the right hand side / has almost no influence 
on the numerical solution since the error of the perturbed equation is of the 
same magnitude as the error of the unperturbed setting. 

Perturbing k and c, however, does lead to an increase in the error of about 
one magnitude. Nevertheless, also in those cases the error is still below the 
discretization error and we need to keep in mind that the perturbation applied 
is several orders of magnitude larger than any oscillation introduced by round- 
off errors. 
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Fig. 1. Absolute error for a fixed grid point index in y direction. 



error 


L 1 norm 


L 2 norm 


L°° norm 


unperturbed, 1 core 
unperturbed, 4 cores 
/ perturbed, 1 core 
/ perturbed, 4 cores 


1.4341 x 1(T 5 
1.4329 x 1(T 5 
2.8822 x 1(T 5 
4.5153 x 1(T 5 


1.6377 x 10~ 5 
1.6368 x 10" 5 
2.9599 x 10~ 5 
4.6324 x 10" 5 


3.0362 x 10~ 5 
3.0362 x 10~ 5 
4.9197 x 10" 5 
6.5409 x 10~ 5 


k perturbed, 1 core 
c perturbed, 1 core 


10.946 x 1(T 5 
10.747 x 10" 5 


11.130 x 10" 5 
10.912 x 10" 5 


23.476 x 10~ 5 
15.839 x 10~ 5 



Table 3 

Poisson solver stability test: The numerical error in comparison with the analytical 
solution. 

2.5 Time Stepping 

For time integration, the ANTARES framework provides the second and third 
order TVD Runge-Kutta schemes of [7] as well as the second order three 
stage scheme of Kraaijevanger [IS] . N. Kwatra et al. suggested in [19] two 
variations of the temporal integration with Runge-Kutta methods. The first 
is to perform Runge-Kutta on just F^U) and F vis (U) with one final implicit 
integration of F nona d(U). For the second variation they suggest to integrate 
Fad(U), F v i S (U) and -F nona d(?7) for each Runge-Kutta stage. In [193 a bett er 
performance is reported employing the second variation. Since this coincides 
with our observations, we use the second variation in all our simulations. 

In [12] a Courant-Friedrich-Levy condition based on an estimate of the maxi- 
mum value of \u\ and the pressure gradient VP throughout the next timestep 
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is derived, 



At | ' lmax P ) < 1. (33) 



The term corresponding to the pressure gradient can be neglected in the limit 
where Ax — > 0, since its contributions are of lower order (see [3T] for a discus- 
sion of the treatment of lower order terms in stability analysis). In that case, 
we arrive at a timestep restriction exclusively due to the velocity of the flow, 



r fl uid = CFL adv -^- (34) 

I ^ I max 

Further restrictions on the timestep At are imposed by heat diffusion tt and 
the viscosity r visc . In case of a two component flow, the diffusion of the second 
component also leads to a timestep restriction, r c . 

We define our timestep as 



At = min {r T , r visc , r fluid (, r c )} (35) 

where 



Ax 2 Ax 2 Ax 2 
t c = CFLdifi ■ , tt = CFLdis • , T v isc = CFL^m ■ . 

CFL di ff and CFL adv denote Courant numbers not necessarily equal. 

The maximum Courant numbers CFL advmax and CFL difTmax may be deter- 
mined by (linear) von Neumann stability analysis (see, for example [31]). How- 
ever, in case of CFL adv max , the nonlinear WENO scheme must be linearized 
and the maximum Courant number obtained by linear stability analysis pro- 
vides just an estimate. In fact, the maximum Courant number of WEN05 com- 
bined with the third order TVD Runge-Kutta method is predicted to be 1.43, 
but simple tests indicate that this solver becomes unstable at CFL adv = 1.0 
(sec [24J and [32J). Thus, in practice we make sure to choose a Courant number 
smaller than the one theoretically possible. 

The source term on the right hand side of equation Q, which represents buoy- 
ancy forces acting on the flow, also poses a restriction on the timestep, but 
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similarly to the restriction caused by the pressure gradient above its contri- 
butions are of lower order and hence typically only provide an accuracy but 
not a stability restriction. 



3 Simulations in the Astrophysical Regime 

3.1 Physical Setting 



To set up a model of stellar convection, we follow the ansatz of [25] and [31] by 
considering a hydrostatically layered fluid column which is unstable against 
convection. In terms of the Schwarzschild criterion of convective instability 
[33J with x pointing along the direction of gravity, this reads 



9T >m . (36) 

ad 



dx \ dx 



From the outset we assume the gas to be ideal gas with a 7-law where 7 = 5/3. 
The volume expansion coefficient is given by a = 1/(7 — 1) and the specific 
heat at constant pressure is cp = i2 gas (l + ot). 

These data suffice to determine the adiabatic temperature gradient 

= — (37) 
c P 

once we have specified the downward pointing constant gravity g. 

We fix the temperature at the top and define a function b(x) to be the ratio 
of the actual temperature gradient to the adiabatic one: 

dT ( dT \ 38 

dx \ dx J ad 




We set b(x) = a with a > 1 and integrate equation (38) straightforwardly. 
This way the temperature - and therefore the energy - is specified for the 
entire domain. 

We determine the density at the top and eliminate the pressure P in the 
equation of hydrostatic equilibrium. Simple integration yields the density for 



the entire domain. The pressure P is obtained from the equation of state (22). 
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To complete our model we need to specify the viscosity coefficient 77 and the 
radiative conductivity k. This is achieved by prescribing the Rayleigh number 
Ra and the Prandtl number Pr = cj>r]/K. The former quantities arise in the 
definition of the starting model only, since in our model we assume K(x, t) 
and r)(x,t) to be constant. This setup simplifies studies of the basic physics 
while it is still useful for insight into in the astrophysically relevant case. 



To start dynamics away from equilibrium we apply a random initial pertur- 
bation. 



3.1.1 Starting model for the semiconvection problem 

Double-diffusive convection is a phenomenon encountered in chemically inho- 
mogeneous flows where a significant mean molecular weight gradient is present. 

Semiconvection is the most important special case of double-diffusive con- 
vection in astrophysics. Models of stellar structure and evolution predict set- 
tings where the heavier product of nuclear fusion provides stability to a zone 
which otherwise would be unstable to convective overturning, because tem- 
perature sufficiently rapidly decreases against the direction of gravity. Such a 
zone would become convective, if its composition were mixed, and the question 
whether such a zone should be treated as if it were mixed or not has become 
known as the semiconvection problem [35J. 



To model a chemically inhomogeneous fluid we need to add an equation de- 
scribing the dynamics of the second species to our set of equations. The corre- 
sponding partial density equation has been introduced in Section [2} The start- 
ing model for a semiconvective simulation has been developed by F. Zaussinger 
[21] and is set on top of the starting model for stellar convection by expanding 
it with the specification of the Lewis Number Le = cppn c /K, which relates 
the radiative conductivity K to the diffusion coefficient k c as well as with the 
stability parameter R p . The stability parameter is defined as the ratio of the 
gradient of mean molecular weight to the superadiabatic temperature gradient 



(which itself is given by b(x)-l from equation (38). Further details are found 
in 



3.2 Simulation Setting 



Due to the compressible nature of the flows we model, we specify the vertical 
extent of the simulation domain in multiples of the pressure scale height Hp = 
P/(pg). For the simulations presented the domain always covers 1 H P . 
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To determine the resolution required to resolve even the smallest structures in 
our simulations - namely, to perform a direct numerical simulation - we resort 
to mixing length theory. In [31], a correlation between the Rayleigh number 
and the diffusion coefficients is derived and a balance between advection by 
the interior flow and the diffusion across an interface is used to estimate the 
boundary layer thickness 5t as 

5 T = Ra*~* H, (39) 

where H denotes the (physical) height of the domain and Ra* = Ra ■ Pr is the 
modified Rayleigh number. If H denotes the number of grid points along the 
vertical direction, then St is the number of grid points resolving the thermal 
boundary layer. 

Similarly, in case of double-diffusive convection, the boundary layer thickness 
of the second species 5 C follows from the definition of the Lewis Number and 
scales with the thermal boundary layer thickness, 

5 C = Tie 6 T . (40) 
The same conclusion can be drawn for the viscous boundary layer, 

6 yilK = y/Pi6 T , (41) 

using here the definition of the Prandtl number Pr. 

In our simulations we seek to resolve the boundary layers with a minimum of 
5 grid points. 

The system of equations is discretized on a rectangular, equidistantly spaced 
grid. Boundary conditions are based on the assumption that all quantities 
are periodic in the horizontal direction. Moreover, for the hydrodynamical 
equations, we employ "closed" (Dirichlet) boundary conditions in the vertical 
direction. 

The simulation time is measured in units of sound crossing times (scrt). One 
sound crossing time is defined as the time taking an acoustic wave to propagate 
from the bottom to the top of the simulation box. 
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X-Momentum at constant depth 
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Fig. 2. Simulation of double-diffusive convection. The upper picture shows a hori- 
zontal slice of the momentum in x-direction after 47 scrt. Already, two-point insta- 
bilities have appeared. 

3.3 A Numerical Instability 

After ascertaining the basic properties of the numerical method as outlined 
in [19], we extended the method to two-component viscous flow and turned 
to simulations of stellar convection and stellar semiconvection. Although for 
some parameter sets the extended method performed well and exhibited the 
higher timestep expected from it, for other parameter sets spurious oscillations 
appeared in the numerical solution. 

In Figure [2] we show a snapshot of a simulation of double-diffusive convection. 
The simulation parameters are Pr = 1.0, Le = 1.0, R p = 1.1 and Ra* = 
160000. We have a spatial resolution of 100 x 100 grid points and choose a 
CFL Number of CFL^ = CFL^m = 0.15. After a certain time, two-point 
instabilities appear in the numerical solution, propagate over the entire domain 
and render the simulation useless. 

The appearance of the instabilities was quite surprising since the explicit coun- 
terpart of the method did not show any similar instability effects. We therefore 
strived to enhance the long-term stability of our solver. 



4 Enhancing Stability: Dissipative Spatial Discretization 



In the ANTARES framework, the second-order terms have been evaluated via 
a finite difference scheme of fourth order [2"B"fT5] , 
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This stencil satisfies the properties of flux conservation, easy generalization to 
more spatial dimensions through directional splitting and easy implementation 
into conservative numerical schemes for the hyperbolic part of the hydrody- 
namical equations. However, the following analysis shows that this stencil lacks 
a crucial property: strict dissipativity. 



4-1 Dissipativity Analysis 

A strictly dissipative finite difference scheme efficiently damps high-frequency 
oscillations in the numerical solution. To determine whether our scheme is 
strictly dissipative or not, we resort to dissipativity analysis (see [31] for the 
general idea). 

The crucial quantity studied in this context is the amplification factor g = 
g{0). The propagation of the numerical solution by one time step corresponds 
to the multiplication of the Fourier transform of the numerical solution by this 
factor g. Thus, the magnitude of the amplification factor equals the factor by 
which the amplitude of each frequency in the solution is increased or decreased 
during each time step. Hence, the finite difference scheme is strictly dissipative 
if \g(0)\ < 1 for all 9 G [— 7r,7r] and g(0) = 1. In general, 9 depends on the 
timestep At and, therefore, g{9) depends on the Runge-Kutta method used 
for time integration. 

The detailed analysis is given in |15j . so we will limit ourselves to the results. 
It turns out that for both the second and third order TVD Runge-Kutta 
methods, g(9) = 1 for 9 G {— 7r,0,7r} while \g(9)\ < 1 for sufficiently small 
At for all other 9 G [— 7r, it]. Thus, both schemes reduce the magnitude of 
most frequencies (given an accordingly chosen timestep) but not the highest 
frequencies appearing on the grid scale. This is essentially the property we 
lack in our simulations. 

Therefore, we here derive a finite difference scheme which is dissipative, of 
fourth order and consistent with the ENO spatial discretization employed for 
the advective terms of the Navier-Stokes equations. 
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4-2 Derivation of the finite difference discretization from the WENO ap- 
proach 



For the convenience of the reader, we repeat the derivation suggested in [T5] . 

In the WENO-approach [29] the Ui are assumed to be cell averages of a 
function v(x), 

U f = - J v(Q d( (42) 



X. 1 



Approximating v(x) with a polynomial function p(x) to specified order, p(x) = 
v(x) + 0(h 5 ), one obtains 

{Udx = l(p(x t+ ,)-p( Xl _,)) + 0(h 5 ) (43) 

as long as the functions are smooth enough to give an extra h in the difference 
to cancel that one in the denominator. Clearly, using p(x i+ i/ 2 ) as approxi- 
mation to Ui+i/2 has a lower order of accuracy, the higher order comes from 
perfect cancellation when building the difference. Now using p'{xi±i/2) for 
building the 2nd derivative one obtains 

p'{x i+ i) -p'ix^i) 
{Ui) xx = 1+2 \ (44) 

Centered 4-point stencil approximation leads to 

U^! - 15 Ui + 15 U i+1 -U i+2 2 
Pi+\ = Y2h + °( h ), ( 45 ) 

and 

Ui-2 — 15 Ui-i + 15 Ui — U i+ i . 2 . 
Pi-| = Y2h + °(M, (46) 

where the order of accuracy is measured with respect to (Ui±i/ 2 )x- The calcu- 
lation was done using a Mathematica 4.1 script. The second derivative is then 
given by 



W). - -^ + 16 ^'-^ + 16 ^-^ + 0(*') (47) 
which corresponds to the standard centered 5-point stencil approximation. 



The corresponding dissipativity analysis in [15] shows that this spatial dis- 
cretization is dissipative in combination with the Runge-Kutta methods from 
[7] mentioned in Section 2.5 the method of JTH] mentioned there is analyzed 
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in the appendix of [18]). However, this property comes at the cost of an addi- 
tional restriction to the stability constraint. Monotonic decay of the numerical 



solution is ensured, if the diffusive Courant Number in (2.5) satisfies 



CFLa\ 



diff 2 max 



max 



(48) 



The reader will notice that the dissipativity analysis has been performed as- 
suming (47) to be a finite difference scheme, whereas (47) was derived in a 



finite-volume framework (e.g. equation (|42|)). The justification for mixing both 
approaches is given by Shu in [30] . He shows that for one-dimensional conser- 
vation laws the finite-volume and the finite-difference WENO reconstruction 
procedure is essentially the same. The only difference is found in the input- 
output pair: for a finite volume scheme, the input is the set of cell averages 
and the output is the set of reconstructed values of the solution at the cell 
interfaces; for a finite-difference scheme, the input is the set of point values of 
the physical flux and the output is the set of numerical fluxes at the cell inter- 
faces. The transition to several dimensions is obtained by dimensional splitting 
as common in finite difference methods. Note that the important part of the 
derivation given here is actually the intermediate result (42)-(46): this is the 
way the stencil is implemented in ANTARES and it ensures the property of 
flux conservation and easy implementation into the WENO-framework when 
combined with dimensional splitting. 



5 Numerical results and further tests 



In all the following simulations we use the second order TVD Runge-Kutta 
method of [7] for time integration. We demonstrate the potential of the im- 
plicit time integration of the pressure terms through the approach of [19] and 
with the extensions derived in the previous sections for a number of problems 
including idealized cases, benchmark tests and astrophysical applications. We 
also show that the method performs extremely well on highly parallelized 
supercomputing platforms. 



5.1 Speed-up of simulations of convection and semiconvection 



We first rerun the simulation of double-diffusive convection shown in Figure 
[3] employing the dissipative finite difference scheme derived in the previous 
section. Although the new stencil entails a more severe timestep restriction, we 
choose the Courant number such that in both simulations the same timestep 
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Fig. 3. Rerun of the simulation described in Section 3.3 with strictly dissipative 
spatial discretization. The picture shows a horizontal slice of x-momentum after 47 
scrt. There is no indication of spurious oscillations. 

is used. We see in Fig. [3] that due to the property of strict dissipativity our 
improved solver efficiently damps the high frequency oscillations on the grid 
scale and leads to a smooth and stable solution. 

As we have already mentioned, the explicit counterpart of the fractional step 
method, namely, employing the WENO scheme for the entire hyperbolic part 



and discretizing the viscous terms via the dissipative scheme (42) has rendered 



stable results. Similarly, the Boussinesq equations discretized with the same 
techniques as the explicit solver in [31] yielded stable results even at much 
higher timesteps. For in the present fractional step method we suspect that 
some instability is introduced by the discretization of equation ([7]), where the 
advective term and the source term are discretized using an explicit Euler- 
forward timestep whereas the diffusive pressure is defined implicitly at the 
new timestep. 

However, we rule out the solver for the Poisson-like equation as a sole initiator 
of the instabilities, partly, because the same solver is successfully employed 
solving the Boussinesq equations in [31] and partly due to the satisfyingly 
stable results the solver renders even, if slight oscillations are present in the 



coefficient functions (see Section (2.4.2)). Nevertheless, one property of the 
generalized Poisson equation is that information is propagated infinitely fast 
such that any instabilities appearing in the numerical solution travels instan- 
taneously through the entire domain. This might favor the build-up of the 
observed oscillations. 

We conclude that the fractional step method may be stabilized by employing 
a strictly dissipative discretization scheme. 



Having successfully enhanced the longterm stability of the method, we per- 



24 



formed a series of simulations of stellar convection and double-diffusive con- 
vection. Such a simulation is expected to evolve the following way: after the 
initial vertical oscillations^] are damped out, the velocity field slowly starts 
building up. Large scale gravity waves form and break, causing the fluid to 
mix locally. The turbulent mixing process spreads over the whole domain and 
after a while, quasi-stationary convective rolls form. 

In case of the double-diffusive convection scenario, the main difference to the 
chemically homogeneous convective scenario is that the stable mean molecular 
weight gradient delays the build-up of the velocity field, leading to a consid- 
erable prolongation of the diffusive phase. By the term "diffusive phase" we 
refer to the first part of the simulation, where the fastest timescale is set by 
diffusion processes. Eventually, the fluid velocity increases such as to pose an 
even greater restriction on the timestep than diffusion. Once this happens, the 
simulation has reached the "advective phase" . 

Figure [4] shows the timestep evolution in a simulation of stellar convection. The 
maximum timestep advantage compared to the explicit scheme is achieved in 
the diffusive phase of the simulation, which lasts in this case approximately 30 
scrt. The build-up of the velocity field is best followed by looking at the mean 
Mach number. In the range of 10-20 scrt the velocity steadily increases, until 
the first gravity waves break at approximately 22 scrt. The mixing process 
slows the growth of the velocity until it reaches a steady state of convective 
motions (at approximately 55 scrt). At a certain point, the fluid velocity is 
high enough to pose a greater restriction on the timestep than diffusion. Nev- 
ertheless, even in the final state with quasi-stationary averaged properties the 
timestep employed is still twice the sound-speed based timestep. 

Altering the parameters of the setting of stellar convection, however, leads to 
a simulation where much higher fluid velocities are encountered. Although the 
efficiency gain in the timestep is limited to the initial diffusive phase, we give 
this example to show that unlike methods especially tailored to the low Mach 
number regime, the fractional step method is capable of passing smoothly to 
regimes where Mach numbers of almost 1 are encountered. The evolution of 
the mean Mach number as well as the timesteps are shown in Figure [5j 

Figure [6] depicts the timestep evolution in a simulation of semiconvection. 
Here we encounter a real low Mach number regime since the Mach number 
never exceeds 0.2 and we see that the relevant timestep restriction stems from 



These oscillations arise because due to rounding errors and due to the different 
truncation errors of the integration of the initial condition and the spatial discretiza- 
tion of the dynamical equations in the setup of the starting model the simulation is 
not in perfect hydrostatic equilibrium. 
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diffusive processes which leads to a timestep nearly 5 times higher than r sn d. 
Table [4] lists the mean and maximum timesteps achieved in the simulations. 




Fig. 4. Simulation of stellar convection with parameters Pr = 1.0 and Ra* = 160000. 
The spatial resolution is 150 x 150 grid points and we use a Courant number 
CFL a dv = CFLdiff = 0.2. Simulation time is 70 scrt. 



setting 


7"max 


7"mn 


7"exp 


''"max/ 7~exp 


Tmn/ T~exp 


Convection, Pr=1.0 


42.36 s 


32.03 s 


7.07 s 


6.0 


4.53 


Convection, Pr=0.1 


5.91 s 


3.73 s 


2.64 


2.24 s 


1.41 


Semiconvection 


26.73 s 


26.5 s 


6.57 s 


4.07 


4.03 



Table 4 

Maximum and mean timesteps (T max resp. r mn ) achieved in the simulation of con- 
vection and double diffusive convection. To highlight the gain in performance, we 
give the ratio of the achieved timesteps and the sound speed based timestep r exp . 



5.2 Convergence Test 



To demonstrate the accuracy of our solver, we performed a grid refinement 
study of a double-diffusive single layer simulation with parameters Pr = 0.1, 
Le = 0.1, R p = 1.1 and Ra* = 1.6 • 10 6 . We successively increased the res- 
olution, starting at a simulation with 200x200 grid points and arriving at a 
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Fig. 5. Simulation of stellar convection with parameters Pr = 0.1 and Ra* = 160000. 
The spatial resolution is 400 x 400 grid points and we use a Courant number 
CFL a dv = CFLdift = 0.2. Simulation time is 100 scrt. 



maximum resolution of 800x800 grid points. In view of our criteria (39) and 



(40) for resolving also the smallest structures in our simulations, the coarsest 



grid resolves the Helium boundary layer with about 3 grid points (which is 
quite low) whereas on the finest grid, the diffusive layer is resolved with 12 
grid points. 



We compare the error between sucessive grid resolutions, i.e. we compare the 
error between the simulations with 200x200 grid points and 400x400 grid 
points to the error between the simulations with 400x400 grid points and 
800x800 grid points. As expected, the error between successively refined grids 
decreases. This strongly indicates that our method is convergent. 

In the first 50 scrt, where the simulation is dominated by diffusion processes, 
the relative errors in density and pressure are nearly constant and the error 
between the simulations with 400x400 and 800x800 grid points is considerably 
smaller than the error between the simulations with 200x200 and 400x400 
grid points. However, a comparison for a given point in time has only limited 
meaning, since the solution changes its nature as a function of time. Initial 
vertical oscillations are damped out and the velocity field slowly starts building 
up. Subsequently, gravity waves form and break. Turbulence sets in and the 



27 



100 



80 



60 



40 



l snd 
^therm 
x fluid 
"^visc 



'* < IS' 



Mil 



20 - 



I 1 1 1 1 1 

100 200 300 400 500 

scrt 



E 




100 200 300 400 500 

scrt 



Fig. 6. Simulation of double-diffusive convection with parameters Pr = 1.0, 
Ra* = 160000, Le = 0.5 and R p = 1.1. The spatial resolution is 150 x 150 grid 
points and we use a Courant number CFL ac j v = CFL^d = 0.2. 

solution reaches a statistically stationary state. Due to turbulence a direct 
comparison of the solutions no longer provides meaningful information about 
the discretization. 



5. 3 Scalability of the fractional step method 



With the advent of high performance supercomputers, efficient parallelization 
techniques and the scalability of algorithms have become vital for any scientific 
computation. Although we have already demonstrated the scalability of our 
solver for the generalized Poisson equation in Section 2.4 we would like to 
demonstrate the scalability of the overall fractional step method, especially, 
since there are recent publications like [11J stating that algorithms containing 
an elliptic equation are limited in parallel efficiency due to the overhead the 
parallel solution of the elliptic equation incurs. 



To demonstrate that such claims of problematic scalability properties do not 
hold for the fractional step method, we use the setting of stellar convection 
with parameters Pr = 0.1 and Ra* = 160000. In a first test series, we dis- 
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Fig. 7. Comparison of the error between successively refined grids. 



cretize the domain with 1600x1600 grid points and advance the simulation 
0.5 soundcrossing times employing a different number of CPUs. Table [5] lists 
the wallclock times. Indeed, with 64 times the number of cores the code is 
about 49 times faster which is similar to the scaling reported in Table 2 (for 



a different computer architecture) for the plain solver for (32). Note that the 
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single-core run has not been performed for the entire 0.5 scrt due to the ex- 
cessively long runtime estimated to be about 7.3 days. However, the overhead 
of the Schur complement method quantified in Section 2.4 to be a factor 12 
reduces here to a factor of 2.5. This may be due to memory issues - arrays as 
large as in the present testcase do not fit easily into the cache memory and a 
full hydrodynamical simulation requires definitely more memory than the sim- 
ple solution of the generalized Poisson equation. The emerging latencies due 
to cache misses considerably slow down the calculation. Clearly, this demon- 
strates strong scaling of the fractional step method over 3 orders of magnitude 
in number of processor cores. 



The second suite of tests is designed to analyze the behaviour of our solver 
if more than 1000 cores are used. To reasonably employ each core we use the 
same setting as in the first test series, but discretize the computational domain 
with 3200x3200 grid points. The wallclocktimes given in Table [6] indicate that 
we have optimal scaling up to 2048 cores. However, employing 4096 cores, 
there is almost no acceleration compared to the 2048-core run. In order to ex- 
clude the solver for the generalized Poisson equation as the cause for this lack 
of efficiency, we have rerun the same test with the fully explicit solver. The 
performance of the fully explicit solver is quantified in Table [7] and again, the 
employment of 4096 cores does not show a significant advantage over the run 
with 2048 cores. Therefore, we attribute this lack of acceleration to the specific 
network structure of the VSC2. Furthermore, with an appropriate paralleliza- 
tion strategy (multigrid solvers for systems of linear equations), corresponding 
solution techniques are reported to scale up to 290 000 cores [6]. 



Note that due to the exorbitant number of 3200 grid points we use per di- 
rection in this test, the dominant timestep restriction stems from diffusive 
processes. Hence, both solvers use the same time increment and since the 
solution of the generalized Poisson equation required by the fractional step 
method does take time, especially, if many grid points are used, the explicit 
solver is computationally less expensive. We also note here that the number 
of grid points per domain at 4096 cores is the same as for the test with 1600 
grid points per directin on 1024 cores. Thus, the size of the transferred data 
per domain is the same for both cases, but the total data transfer over the 
network increases for the larger problem. 



5.4 Validation in the High Mach Number Regime 



To validate the shock-capturing capability of the fractional step method, we 
have rerun the standard Sod shock tube test as outlined in [T9]. We use a 
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4h corps 


time in s 


1 
1 


Estimate: 7.3 days 


10 


no. co. en 
Zo.OZ.QU 


64 


05:53:13 


256 


01:20:02 


1024 


00:26:32 



Table 5 

Fractional step method scaling test (1600x1600 grid points), calculated at the Vi- 
enna Scientific Cluster 2 (VSC2). 



# cores 


time in s 


256 


24:15:00 


1024 


06:39:44 


2048 


04:44:25 


4096 


04:38:15 



Table 6 

Fractional step method scaling test (3200x3200 grid points), calculated at the Vi- 
enna Scientific Cluster 2 (VSC2). 



# cores 


time in s 


1024 


02:11:21 


2048 


01:42:29 


4096 


01:22:03 



Table 7 

Explicit solver scaling test (3200x3200 grid points), calculated at the Vienna Scien- 
tific Cluster 2 (VSC2). 

computational domain of 1 cm and 405 grid points. The initial condition reads 

f (1,0,1) if x< 0.5cm , N 

p, u, P = < s ~ 49 

I 0.125,0,0.1 if x > 0.5cm 



Figure [8] shows a snapshot of the Sod shocktube after t = 0.25s. The solution 
of the fractional step method is plotted against an explicit reference solution. 
The results indicate well resolved shock, rarefaction, and contact solutions 
travelling at the correct shock speeds. 

We also rerun the two-dimensional circular shock test proposed by [TU]. The 
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Fig. 8. Results of the Sod shocktube test at t=0.25 s. Density and Pressure are 
plotted against an explicit reference solution. The results indicate well resolved 
shock waves. 



initial condition for this test is given by 



(p,u,v,P) 



1,0,0,1) if r< 40cm 

(0.125,0,0,0.1) if r> 40cm 



(50) 



We use a domain of 200 cm x 200 cm and discretize it with 95 x 95 grid points. 
Again, the results depicted in Figure [9] indicate well resolved shocks. 



5. 5 Validation in the Low Mach Number Regime 



5.5.1 Smooth Flow Test 



In |19j . N. Kwatra et al report to have achieved a CFL number of 300 in a 
smooth flow test with initial conditions 



m(x,0) = 

P(x,0) = P + eP 1 (x) 
Pi {%) — 6 cos(27rx) + 10 sin(47nr) 

'P(x,0)~ 



p(x,0) 



7 is the adiabatic index of the gas and p = 1, P = 10 3 , e = 1. 

Contrary to [19J, we use a fixed grid resolution of 800 grid points and increase 
the Courant number. The timing results in Table [8] show that the semi-implicit 
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(d) x-Momentum (semi- (e) x-Momentum (explicit) (f) x-Momentum 

implicit) 

Fig. 9. The depicted panels are snapshots of the two-dimensional circular shock test. 
On the upper left, the density at time t=16.4 s of the simulation performed with 
the fractional step is shown, next to it the explicitly calculated reference solution. 
The panel on the right hand side depict a slice of density where the fractional step 
solution (red) and the fully explicit solution (green) are compared. In the lower 
part, the same applies to snapshots of the x-momentum. 



method is far more efficient than the explicit method even when the same 
time increment is used. This was already remarked by [19] and predominantly 
attributed to the avoidance of the characteristic decomposition in the ENO 
scheme. However, in the scaling test in Section 5J3 quite the opposite was 
true, namely, the explicit solver was considerably faster than the semi-implicit 
method when the same timestep was employed. We attribute this to the num- 
ber of grid points - in the smooth flow test, we discretize the domain with 100 
grid points whereas in the scaling test we employed 3200x3200 grid points. 
Apparently, as the number of grid points increases, the gain in efficiency due 
to the avoidance of the transformation onto the characteristic space is can- 
celled by the time spent in the solver for the generalized Poisson equation. 
Note that in the Smooth Flow test, the maximum time increment is 600 times 
that one of the explicit method, its wallclocktime 234 times smaller than the 
explicit method and 105 times smaller than the semi-implicit method at the 
same small time-increment. The loss of a factor of about 6 in efficiency is due 
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to the larger number of iterations during the solution of (32) needed for the 
much larger time-steps. We also remark that although it is possible to employ 
a CFL-number of 300 without running into numerical difficulties it is not ad- 
visable to do so for long integration times since the dissipation added by the 
WENO-scheme damps the original flow pattern considerably. 

Figure [To] depicts snapshots of pressure in the smooth flow test run at different 
Courant numbers. 



method 


CFL-number 


timestep At 


Wallclocktime 


explicit 


0.5 


1.53 x 10" 8 


01:18:04 


semi-implicit 


0.5 


1.53 x 10" 8 


00:35:04 


semi-implicit 


3 


9.18 x 10" 8 


00:06:33 


semi-implicit 


30 


9.18 x 10~ 7 


00:01:59 


semi-implicit 


300 


9.18 x 10~ 6 


00:00:20 



Table 8 

Timing results from the smooth flow test for different Courant numbers. Simulation 
time is t=2.5 x 10~ 5 s. 

1.00015e+09 



1.0001e+09 



1.00005e+09 



9.9995e+08 



9.999e+08 



9.9985e+08 




100 200 300 400 500 600 700 800 900 
grid points 



Fig. 10. Numerical results comparing the pressure at t=1.25 x 10 5 s in the smooth 
flow test run with different CFL numbers. 



5.5.2 The Gresho Vortex test 

The Gresho vortex is a time-independent rotation pattern. Angular velocity 
depends only on the radius and centrifugal force is balanced by the pressure 
gradient. The original setup is found in [21J. We use the slightly modified 
initial condition of [22], which permits the variation of the Mach number. 

We use a Cartesian domain [0,1] x [0,1] and employ periodic boundary 
conditions. The initial condition is given dependent on the radius r = 
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yj(x - 0.5) 2 + (y - 0.5) 2 as 



p = 1.0 (51) 

f5r if < r < 0.2 

i^=|2-5r if0.2<r<0.4 (53) 
[o if0.4<r 

'P + fr 2 if0<r<0.2 

Po + f r 2 + 4 (1 - 5r - ln(0.2) + ln(r)) if 0.2 < r < 0.4 (54) 

^0-2 + 4 1x1(2) if0.4<r 



Ma denotes the Mach number and w^, denotes the angular velocity. The carte- 
sian velocity components are obtained via 

u x = sin(0) «0 (55) 
u y = cos (9) tt^ (56) 

where 9 = atan2(y — 0.5, x — 0.5). 

We run the Gresho vortex test with Ma = 0.1, Ma = 0.01 and Ma = 0.001. 



Figure [TT] compares the results obtained in this setting by the semi-implicit 
method and the fully explicit solver. In this test, both solvers use, for proper 
comparison, the same sound-speed induced timestep and a Courant number 
of 0.5. The simulation time is 2 sec. 

The results show that for Ma = 0.1 both solvers yield similar results, al- 
though the dissipation of kinetic energy is higher in the explicit solver (see 



also Figure 12). However, as the Mach number drops, the solution of the ex- 
plicit solver becomes highly inaccurate. This breakdown is well known in the 
literature and attributed to the fact that the pressure term is of order I /Ma 2 , 
which introduces considerable inaccuracy as the Mach number approaches 
(see P2[14,34j). However, in JI3] it is shown that within pressure based meth- 
ods such as the fractional step method of Kwatra et al. (19] the pressure 
variation remains finite, irrespectively of the Mach number. This is vividly 
demonstrated by the highly accurate results of the semi-implicit method even 
at Mach numbers as low as 10~ 3 . 

Since the semi-implicit method theoretically allows the simulation to be ad- 
vanced with a timestep higher than the one induced by the sound speed, 
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(a) Ma=0.1, t=0 s 



(b) Ma=0.1, t=2 s, exp. 



(c) Ma=0.1, t = 2 s, sem.i. 




(d) Ma=0.01, t=0 s (e) Ma=0.01, t=2 s, exp. (f) Ma=0.01, t=2 s, sem.i. 




(g) Ma=0.001, t=0 s (h) Ma=0.001, t=2 s, exp. (i) Ma=0.001, t=2 s, sem.i. 

Fig. 11. Comparison of the Gresho vortex test performed with the explicit and the 
semi-implicit solver with different Mach numbers. On the left, the initial configu- 
ration is depicted. In the center and on the right, the simulation is advanced 2 sec 
with the explicit solver and the semi-implicit solver, respectively. 



we have rerun the Gresho vortex test with Ma = 0.001 and the semi-implicit 
solver, advancing it by multiples of the sound-speed induced timestep. We find 
that the simulation remains stable for up to four times the explicit timestep. 
Figure 13 shows no adverse effects on the dissipation from choosing a larger 
timestep. 
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Ma=0.1 



explicit solver 
semi-implicit solver 





t in [s] 



Fig. 12. We compare the relative kinetic energy predicted by the simulations over 
time. Whereas the dissipation of the explicit solver increases dramatically as the 
Mach number drops, the dissipation of the semi-implicit method is independent of 
the Mach number for exactly the same number of timesteps. 



6 Conclusions and Outlook 



We have extended the numerical method of [19J to viscous two-component 
flows and improved the stability of the overall scheme by employing a strictly 
dissipative finite difference scheme for the parabolic part. We have applied 
it to simulations of stellar convection and stellar semiconvection, where es- 
pecially the latter is situated in the low Mach number regime. Apart from 
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t in [s] 

Fig. 13. Courant number tests with the semi-implicit method. Advancing the Gresho 
vortex with Ma = 0.001 with a greater timestep does not affect the dissipation of 
the scheme. 

rendering highly accurate results, the implicit treatment of sound waves al- 
lows the employment of a much larger timestep to advance the simulations in 
time. Contrary to most solvers designed for low Mach number regimes, this 
method is capable of passing smoothly to higher Mach numbers. This makes it 
especially suited to problems, where a wide range of parameter sets is covered. 
A prominent example of such a setting is the solar convection zone, where at 
the bottom low Mach numbers of order O(10 -4 ) are encountered, whereas at 
the top of the same convection zone Mach numbers of order 0(1) are reached 
at least locally inside fast downdrafts, (see [5fT7] ). 

We have also validated the shock capturing capability of the fractional step 
method as well as its performance in very low Mach number regimes. We find 
that the fractional step method renders stable and accurate results even in 
regimes with Ma = 0.001. 

By employing a highly parallelized solver for the generalized Poisson equation, 
we show that the method is suitable for modern high performance computing. 
We demonstrate the strong scaling of the method over three orders of magni- 
tude in number of processor cores in a realistic astrophysical application. 

Given the beneficial properties of this low Mach number solver outlined in 
this paper, we conclude that it poses a real alternative to traditional low 
Mach number schemes such as the Boussinesq or the anelastic approximation 
which are unreliable in the transition region of moderate Mach numbers (Ma 
w 0.1) and violate various conservation properties (mass in the case of the 
Boussinesq approximation and energy in case of the anelastic approximation 
of [28]) 

However, in some settings within our parameter range, the timestep restric- 
tion due to diffusive processes is also severe and in some cases it is even more 
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Fig. 14. Simulation of double-diffusive convection with parameters Pr = 1.0, 
Ra = 160000, Le = 0.01 and R p = 1.3. The spatial resolution is 600 x 400 grid 
points. Note that in this setting diffusion poses an even greater restriction on the 
timestep than sound waves. 



restrictive than the acoustic timestep restriction (see Figure 14). Note that the 
resolution is somewhat lower in this case (only 3 grid points resolve the diffu- 
sive boundary layer), but for the present purpose this is sufficient, especially 
since at high resolution the diffusive timescales are even more restrictive. 



Following the idea of integrating terms describing processes acting on short 
timescales implicitly, we have combined total-variation diminishing implicit- 
explicit (TVD IMEX) Runge-Kutta methods with our solver. This combina- 
tion treats the diffusive terms implicitly and therefore alleviates the timestep 
restrictions due to diffusive processes. In [TH] TVD IMEX Runge-Kutta meth- 
ods are analysed in detail and the fruitful junction of our solver and those 
methods is demonstrated. 
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